DE eng

Search in the Catalogues and Directories

Hits 1 – 14 of 14

1
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
Abstract: 8 pages plus appendix and references ; In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect to the rights of data subjects represented in data collections, particularly when considering the difficulty in interrogating these collections due to insufficient documentation and tools for analysis. Mindful of these pitfalls, we present our methodology for a documentation-first, human-centered data collection project as part of the BigScience initiative. We identified a geographically diverse set of target language groups (Arabic, Basque, Chinese, Catalan, English, French, Indic languages, Indonesian, Niger-Congo languages, Portuguese, Spanish, and Vietnamese, as well as programming languages) for which to collect metadata on potential data sources. To structure this effort, we developed our online catalogue as a supporting tool for gathering metadata through organized public hackathons. We present our development process; analyses of the resulting resource metadata, including distributions over languages, regions, and resource types; and our lessons learned in this endeavor.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Applications; Collaborative Resource Construction & Crowdsourcing; LR Infrastructures and Architectures; Systems; Tools
URL: https://hal.inria.fr/hal-03550289
BASE
Hide details
2
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
In: Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022) ; https://hal.inria.fr/hal-03639144 ; Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022), May 2022, Dublin, France (2022)
BASE
Show details
3
Catalogue records of photographs (1850-1950) ...
British Library. - : British Library, 2021
BASE
Show details
4
Early English alliterative poems in the West-Midland dialect of the fourteenth century : copied and edited from a unique manuscript in the library of the British museum /
Morris, Richard, 1833-1894.; British Library. Manuscript. Cotton Nero A. x.. - : London : Published for the Early English text society by H. Milford, 1934
BASE
Show details
5
A Coptic palimpsest containing Joshua, Judges, Ruth, Judith and Esther in the Sahidic dialect /
BASE
Show details
6
Texts relating to Saint Mêna of Egypt and canons of Nicaea in Nubian dialect, with facsimile.
Budge, E. A. Wallis (Ernest Alfred Wallis), Sir, 1857-1934.; British Library. Manuscript. Oriental 6805.; British Museum. Department of Egyptian and Assyrian Antiquities.. - : London, Printed by order of the Trustees, sold at the British museum and by Longmans and co. [etc.], 1909
BASE
Show details
7Endangered Archives Programme
http://eap.bl.uk/
Topic: Ethnolinguistics
Language: Achinese; Javanese; Lepcha; ...
Source type: Corpora; Full-text server / Archives
Access: free access
8International Dunhuang Project: The silkroad online
http://www.silkroadfoundation.org/newsletter/vol3num2/2_whitfield.php
Topic: Graphemics; History of language
Language: Chinese language; Sanskrit; Syriac ; ...
Forschungstyp: Research projects
Access: free access
9Incunabula Short Title Catalogue (ISTC)
https://data.cerl.org/istc/_search
Topic: History of language
Source type: Catalogues
Access: free access
10British National Bibliography (BNB)
https://bl.natbib-lod.org/
Topic: Applied linguistics; Computational linguistics; History of language; ...
Source type: Bibliographies
Access: free access
11Codex Sinaiticus
https://codexsinaiticus.org/de/
Topic: Translation science
Language: Greek, Ancient
Forschungstyp: Research projects
Access: free access
12Digitised Manuscripts
http://www.bl.uk/manuscripts/
Language: Anglo-Norman; Dutch, Middle; English; ...
Source type: Bibliographies; Full-text server / Archives
Access: free access
13British Accents and Dialects - The British Library
http://www.bl.uk/learning/langlit/sounds/
Topic: Dialectology / Linguistic geography; Sociolinguistics
Language: English, British; English, Indish; English, Irish; ...
Source type: Introductions / Tutorials
Access: free access
14Sounds Familiar? - Glossary
https://www.bl.uk/teaching-resources/british-accents-and-dialects-glossary
Topic: Dialectology / Linguistic geography; Sociolinguistics
Language: English, British
Source type: Dictionaries; Introductions / Tutorials
Access: free access

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
8
0
3
0
Open access documents
6
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern